Reducing write tail latency in storage systems

ABSTRACT

In order to reduce write tail latency, a storage system generates redundant write requests when performing a storage operation for an object. The storage operation is determined to be effectively complete when a minimum number of write requests have completed. For example, the storage system may generate twelve write requests and also generate four redundant write requests for a total of sixteen write requests. The storage system considers the object successfully stored once twelve of the sixteen writes complete successfully. To generate the redundant writes, the storage system may use replication or erasure coding. For replication, the storage system may issue a redundant write request for each of n chunks being written. For erasure coding, the storage system may use rateless codes which can generate unlimited number of parity chunks or use an n+k+k&#39; erasure code which generates an additional k encoded chunks, in place of an n+k erasure code.

BACKGROUND

The disclosure generally relates to the field of computer storagesystems, and more particularly to reducing write tail latency in astorage system.

Some storage systems store data across a number of storage devices, suchas distributed storage systems or systems utilizing Redundant Array ofIndependent Disks (“RAID”) configurations. A data object stored in astorage system with multiple storage devices may be divided into chunkswith each chunk written to a different storage device. To increasedurability of data, the storage system may employ replication or erasurecoding. When using replication, the storage system duplicates chunks ofan object and sends write requests for the duplicate chunks toadditional storage devices. When using erasure coding, the storagesystem uses an erasure code to algorithmically generate chunks for anobject. In general, an erasure code transforms a data object consistingof n chunks of data into n+k chunks, where k is a number of encodedchunks used for data protection, and allows the data object to bereconstructed from any n chunks of the n+k chunks. For example, an 8+3erasure code, i.e. n=8 and k=3, transforms a data object of eight chunksinto eleven chunks of data. The data object may then be reconstructedfrom any eight of the eleven chunks. There are multiple types of erasurecodes including systematic and non-systematic erasure codes. With asystematic erasure code, the original (non-encoded) n chunks of the dataobject are written to storage devices along with k encoded chunks. Withnon-systematic erasure codes, all of the n+k chunks are encoded. Anothertype of erasure code is a rateless erasure code or fountain code. Unlikeerasure codes with an n+k rate, a rateless erasure code can generate atheoretically unlimited number of chunks and still reconstruct theobject using any n of the chunks.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the disclosure may be better understood by referencing theaccompanying drawings.

FIG. 1 depicts an example storage system that generates redundant writesto reduce write tail latency using replication.

FIG. 2 depicts an example storage system that generates redundant writesto reduce write tail latency using an erasure code and replication.

FIG. 3 depicts an example storage system that generates redundant writesto reduce write tail latency using an erasure code.

FIG. 4 depicts a flowchart illustrating example operations for storingan object in a storage system using redundant write requests.

FIG. 5 depicts a flowchart that illustrates example operations forstoring an object in a storage system using a dynamic number ofredundant write requests.

FIG. 6 depicts an example computer system with a redundant writemanager.

FIG. 7 depicts an example distributed storage system with a storagecontroller that includes a write request manager.

DESCRIPTION

The description that follows includes example systems, methods,techniques, and program flows that embody aspects of the disclosure.However, it is understood that this disclosure may be practiced withoutthese specific details. For instance, this disclosure refers to types oferasure coding in illustrative examples. But aspects of this disclosurecan utilize other data durability methods and other types of erasurecodes such as parity blocks, Reed-Solomon encoding, Raptor codes, etc.In other instances, well-known instruction instances, protocols,structures and techniques have not been shown in detail in order not toobfuscate the description.

Terminology

This description uses the term “chunk” to refer to a discrete unit ofdata. A chunk may also be referred to as a fragment, segment, block,extent, data element, etc. The use of the term “chunk” does not connoteany particular size or format as the size or a format of a chunk canvary based on an encoding scheme, file system, block size, etc.

This description uses the term “write request” to refer to a messageinstructing a storage device to store or write data. A write request mayalso be referred to as a write operation, write command, or writeinstruction. A write request complies with or is sent in accordance withdifferent storage protocols such as Hypertext Transfer Protocol (“HTTP”)REST protocol, Small Computer System Interface (“SCSI”), Internet SmallComputer System Interface (“iSCSI”), etc. A write request may be sent toa storage device along with data to be stored. The storage deviceresponds or acknowledges once the write request is complete, i.e. oncethe data has been stored or written to the storage device.

Introduction

When writing data to a single storage device, the period required tocomplete the write operation is generally predictable with a delay orlatency occurring on a low percentage of writes. However, in a storagesystem with multiple storage devices, the chance of delay for a writerequest increases with the scale and complexity of the system. Theoccasional long delay, known as write tail latency, i.e., a latencyreading that falls in the tail of its latency distribution curve, iscaused by waiting for writes to complete at multiple storage deviceswhich increases the odds of protracted write latency or failure. Forexample, when eight chunks of a data object are written to eightdifferent storage devices, the data object is not acknowledged as storeduntil all eight storage devices have successfully written theircorresponding chunk. If any of the eight storage devices has a highlatency, the storage operation for the data object suffers the samelatency.

Overview

In order to reduce write tail latency, a storage system with multiplestorage devices (“storage system”) generates redundant write requestswhen performing a storage operation for an object. The storage operationis determined to be effectively complete when a minimum number of writerequests have completed. For example, the storage system may normallygenerate twelve write requests when storing the object. To decreasewrite tail latency, the storage system may also generate four redundantwrite requests for a total of sixteen write requests and consider theobject successfully stored once twelve of the sixteen writes completesuccessfully. To generate the redundant writes, the storage system mayuse replication or erasure coding. For replication, the storage systemmay, for example, proactively issue a redundant write request for eachof n chunks being written, resulting in a total of n+n writes. Forerasure coding, the storage system may, for example, use an erasure codeto generate an additional k encoded chunks and generate write requestsfor a total of n+k+k chunks. In some instances, the storage system maygenerate redundant write requests reactively by waiting for initialwrite requests to take longer than a threshold before generatingredundant write requests.

Example Illustrations

FIG. 1 depicts an example storage system that generates redundant writesto reduce write tail latency using replication. FIG. 1 depicts a storagesystem 100 that includes a storage controller 110 and storage devices104. The storage controller 110 includes a storage interface 102, awrite request manager 103 (“write manager 103”), and a client interface111.

The storage system 100 may be a distributed storage system, a RAIDstorage system, or other type of storage system that stores data acrossmultiple storage devices, such as the storage devices 104. The storagecontroller 110 manages read and write commands for the storage system100. The storage controller 110 communicates with the storage devices104 through the storage interface 102. The storage interface 102 may bea SCSI, iSCSI, Advanced Host Controller Interface (“AHCI”), etc., andmay communicate with the storage devices 104 through various protocolssuch as HTTP REST protocol, SCSI, iSCSI, etc. The storage interface 102may be connected to the storage devices 104 through a local or remoteconnection or a hybrid of both local and remote connections. Forexample, the storage interface 102 may communicate with some of thestorage devices 104 over a local area network and others of the storagedevices 104 over a wide area network. Also, the storage interface 102may, for example, communicate with the storage devices 104 through localconnections such as Serial Advanced Technology Attachment (“SATA”)connections, SCSI connections, etc.

At stage A, the client interface 111 of the storage system 100 receivesthe object 101. The client interface 111 may receive the object 101 froma storage system client, an application or backup agent, another storagesystem, etc. The object 101 may be received along with a write commandinstructing the storage system 110 to store the object 101. In responseto receiving the object 101, the client interface 111 or anothercomponent of the storage controller 110 divides the object 101 into thechunks 106 for storage. The number and size of the chunks 106 can varybased on the configuration of the storage system 100 or the storagecontroller 110, a size of the object 101, a number of the storagedevices 104, etc. For example, if the storage devices 104 include fourstorage devices, the object 101 may be divided into four or fewer largerchunks. In FIG. 1, the chunks 106 consist of eight chunks, C0-C7.

At stage B, the write manager 103 sends write requests for the chunks106 to the storage interface 102. The write manager 103 is a componentthat generates and monitors write requests for objects received by thestorage system 100. In various instances, the functionality of the writemanager 103 may be incorporated within the storage interface 102,implemented using software or hardware, etc. In FIG. 1, the writemanager 103 generates write requests for the chunks C0-C7. The writemanager 103 generates write requests in a protocol that is compatiblewith the storage interface 102, such as HTTP REST protocol, SCSI, iSCSI,etc.

At stage C, the storage interface 102 sends the write requests and thecorresponding chunks 106 to the storage devices 104. The storage devices104 may consist of or include multiple types of storage devices such ashard disks, all flash arrays, tape storage, etc. The storage devices 104may also be clusters, nodes, or logical volumes that comprise harddisks, all flash arrays, tape storage, etc. FIG. 1 depicts the storagedevices 104 as including twelve storage devices; however, the number ofstorage devices in the storage devices 104 can vary. The storageinterface 102 sends each chunk of the chunks 106 and the correspondingwrite request to a different storage device of the storage devices 104for storage, i.e. the eight chunks are sent to eight different storagedevices. The storage interface 102 may select which of the storagedevices 104 receive chunks based on a logical or physical ordering ofthe storage devices 104, a RAID configuration, past performance of thestorage devices 104, write queues, etc. For example, the storageinterface 102 may prioritize storage devices that have shorter writequeues or that previously performed write requests faster or morereliably. Each of the storage devices 104 which received one of thechunks 106 attempts to store the chunk as indicated by the correspondingwrite request. The storage devices 104 acknowledge to the storageinterface 102 once a write request is completed successfully.

After sending the chunks 106 and the write requests to the storagedevices 104, the storage interface 102 begins tracking the periodbetween when the request(s) were sent and when the recipient storagedevices confirm completion of the write(s). If the period meets orexceeds a threshold, the storage interface 102 determines which of thechunks 106 have not been stored successfully. The threshold is aconfigurable time period or number of cycles and may be adjustedaccording to how much delay for a write request is acceptable for thestorage system 100. Additionally, the threshold may be dynamicallyadjusted for each write request based on the size of chunks beingwritten, the load of the storage system 100, or the type of storagedevices in the storage devices 104. For example, the threshold mayincrease in proportion to an increase in a size of chunks or decrease iffaster storage devices, such as solid state devices or all flash arrays,are being used.

At stage D, the storage interface 102 indicates to the write manager 103that the write requests for chunk C0 and chunk C7 of the chunks 106 havenot completed within the threshold period. Once the threshold has beenmet or exceeded, the storage interface 102 identifies those of thechunks 106 which were not successfully stored or whose write requestsfailed to complete. In the example of FIG. 1, the storage interface 102determines that chunks C0 and C7 of the chunks 106 were not successfullystored within the threshold and indicates this information to the writemanager 103. The storage interface 102 may indicate the information bysending identifiers for the chunks C0 and C7 or forwarding informationincluded in the corresponding write requests such as an identifier forthe write requests. The storage interface 102 may also log which storagedevices of the storage devices 104 failed to acknowledge within thethreshold for future prioritization of the storage devices 104.

At stage E, the write manager 103 sends the redundant write requests forchunk C0 and chunk C7 of the chunks 106 to the storage interface 102. InFIG. 1, the write manager 103 uses replication to generate the redundantwrite requests by duplicating the initial write requests for chunks C0and C7. The write manager 103 may modify the redundant write requests toindicate that the chunks C0 and C7 should be sent to different storagedevices of the storage devices 104. For example, if chunk C0 wasoriginally sent to a storage device A of the storage devices 104, thewrite manager may indicate in the redundant write request for chunk C0that the chunk should not be sent to the storage device A for storage.Alternatively, the write manager 103 may specify a storage device forthe chunk C0 other than the storage device A or may rely on the storageinterface 102 to select a different storage device of the storagedevices 104.

At stage F, the storage interface 102 sends the redundant write requestsfor chunk C0 and chunk C7 of the chunks 106 to the storage devices 104.The storage interface 102 attempts to write the chunks C0 and C7 in amanner similar to that described at stage C. However, the storageinterface 102 ensures that the chunks C0 and C7 are sent to differentstorage devices of the storage devices 104 than those of stage C. As aresult, there will be two pending write requests for each of the chunkC0 and the chunk C7: the initial write request at stage C and theredundant write request at stage F. The storage interface 102 waits foreither of the storage devices associated with the pending write requeststo acknowledge that the associated chunk was stored successfully. Inother words, the storage interface 102 utilizes the storage device thatis quickest to respond. For example, the initial write request for chunkC0 may respond before the redundant write request, and the redundantwrite request for chunk C7 may respond before the original writerequest.

Similar to stage C, the storage interface 102 may track the periodbetween when the pending request(s) were sent and when the recipientstorage devices confirm completion of the write(s). If no writerequests, either the original or redundant, respond within thethreshold, the storage interface 102 may again indicate to the writemanager 103 those write requests which failed to complete within thethreshold.

Once the storage interface 102 receives a successful storage indicationfor each of the chunks 106, the storage interface 102 may cancel orpreempt remaining write requests. Since a single storage location isused for each chunk of the chunks 106, additional write requests for achunk that has already been stored may be unnecessary and may becanceled. For example, if an initial write request responds before aredundant write request, the storage interface 102 may cancel theredundant write request or preempt the redundant write request.Alternatively, the storage interface 102 may allow the write request tocomplete but mark the location as deleted or mark the location for latergarbage collection.

At stage G, the storage interface 102 indicates locations of the chunks106 to the write manager 103. The location for each chunk includes anidentifier for the storage device on which the chunk is stored and amemory or storage address for the chunk. The identifier for the storagedevice may be a logical volume number, a disk name, network address,etc. The addresses for the chunks may be physical or virtual addresses.The storage interface 102 may receive the locations for the chunks 106in response to the write requests, or the storage interface 102 mayrequest the location for each of the chunks 106 from the associatedstorage device of the storage devices 104.

At stage H, the write manager 103 writes the locations of the chunks 106to the index 105 and indicates that the object 101 was storedsuccessfully. The index 105 may be a database, a log or table maintainedin memory, etc. Although depicted as part of the storage controller 110,the index 105 may be maintained remotely on a server or connectedstorage device. The write manager 103 stores an identifier for theobject 101 along with the locations of the chunks 106 in the index 105.Once the locations and the identifier have been stored, the writemanager 103 may respond to a write command for the object 101 indicatingthat the object 101 was stored successfully in the storage system 100.

When a request to read the object 101 is received by the storage system100, the storage controller 110 uses an identifier for the object 101 tolookup the locations of the chunks 106 in the index 105. The storagecontroller 110 can then use the location information to submit readrequests to the storage interface 102, or may use the locationinformation to read the chunks 106 from the storage devices 104directly. The storage controller 110 then reconstructs the object 101 bycombining the chunks 106 and responds to the read request with theobject 101.

In FIG. 1, the write manager 103 stores the locations of the chunks 106in the index 105. In some instances, the locations for the chunks 106may be prepended or appended to some or all of the chunks 106 in thestorage devices 104. For example, after storing each of the chunks 106,the storage interface 102 may lazily (i.e., in write-back fashion) issueadditional write requests to append the locations to each of the chunks106, or the storage interface 102 may reserve space at the beginning ofchunks to be updated with the locations of the chunks 106. The storageinterface 102 may maintain a mapping between an identifier for theobject 101 and a location of one of the chunks 106. The identifier forthe object 101 may then be used to read the locations for the remainingchunks 106.

The storage system 100 described in FIG. 1 generates redundant writerequests in a reactive manner in that the storage system 100 waits for awrite request to fail to complete within a time threshold beforegenerating and sending redundant write requests. In some instances, thestorage system 100 may proactively generate redundant writes in thatredundant write requests are generated concurrently with the initialwrite requests. Additionally, the storage interface 102 may concurrentlysend the initial write requests and the redundant write requests to thestorage devices 104. For example, in addition to the eight writerequests generated at stage C for the chunks 106, the write manager 103in a proactive implementation may also generate eight redundant writerequests for the chunks 106 for a total of sixteen write requests. Thestorage interface 102 sends the sixteen write requests and then utilizesthe first write request to complete for each of the chunks 106.

FIG. 1 is annotated with a series of numbers A-H. These numbersrepresent stages of operations. Although these stages are ordered forthis example, the stages illustrate one example to aid in understandingthis disclosure and should not be used to limit the claims. Subjectmatter falling within the scope of the claims can vary with respect tothe order and some of the operations.

FIG. 2 depicts an example storage system that generates redundant writesto reduce write tail latency using an erasure code and replication. FIG.2 depicts a storage system 200 that includes a storage controller 210and storage devices 204. The storage controller 210 includes a storageinterface 202, a write request manager 203 (“write manager 203”), a dataencoder 207, and a client interface 211.

At stage A, the client interface 211 receives the object 201 and dividesthe object 201 into chunks 206 in a manner similar to that described atstage A of FIG. 1. The client interface 211 also sends the chunks 206 tothe data encoder 207.

At stage B, the data encoder 207 generates encoded chunks 208 includingencoded chunks EC0, EC1, and EC2 based on the chunks 206. In FIG. 2, thedata encoder 207 encodes the encoded chunks 208 using a systematicerasure code. The systematic erasure code uses a rate of 8+3 meaningthat the object 201 may be recreated using any eight of the elevenchunks including the chunks 206 and the encoded chunks 208. In someinstances, the data encoder 207 may encode the encoded chunks 208 usinga rateless erasure code and may generate more than three chunks for theencoded chunks 208. Additionally, in other instances, the data encoder207 may generate the encoded chunks 208 using another data durabilitytechnique such as parity blocks or generate the encoded chunks 208 inaccordance with a RAID configuration.

At stage C, the write manager 203 sends write requests for the chunks206 and the encoded chunks 208 to the storage interface 202, and atstage D, the storage interface 202 forwards the write requests to thestorage devices 204. These operations are performed in a manner similarto that described at stages B and C of FIG. 1.

At stage E, the storage interface 202 indicates to the write manager 203that the write requests for chunk C7 of the chunks 206 and encoded chunkEC1 of the encoded chunks 208 did not complete within a time threshold.The storage interface 202 monitors the write requests and reports to thewrite manager 203 in a manner similar to that described at stage D ofFIG. 1.

At stage F, the write manager 203 sends redundant write requests forchunk C7 of the chunks 206 and encoded chunk EC1 of the encoded chunks208 to the storage interface 202. Similar to stage E of FIG. 1, thewrite manager 203 uses replication to generate the redundant writerequests, so the write manager 203 replicates the initial write requestsfor chunk C7 and encoded chunk EC1.

At stage G, the storage interface 202 sends write requests for thechunks 206 and the encoded chunks 208 to the storage interface 202, andat stage H, the storage interface 202 indicates locations of the chunks206 to the write manager 203. At stage I, the write manager 203 writesthe locations of the chunks 206 and the encoded chunks 208 to the index205 and indicates that the object 201 was stored successfully. Theseoperations are performed in a manner similar to that of stages F-H ofFIG. 1.

Similar to the storage system 100 of FIG. 1, the storage system 200generates redundant write requests in a reactive manner in that thestorage system 200 waits for a write request to fail to complete withina time threshold before generating and sending redundant write requests.In some instances, the storage system 200 may proactively generateredundant write requests that are sent to the storage devices 204concurrently with the initial write requests. For example, in additionto the eleven write requests generated at stage C for the chunks 206,the write manager 203 in a proactive implementation may also generateeleven redundant write requests for the chunks 206 for a total oftwenty-two write requests. Alternatively, as described in more detail inFigure 3, the data encoder 207 may use an erasure code to generate anadditional k chunks at stage B, and the write manager 203 mayconcurrently generate and send n+k+k′ write requests to the storagedevices 204.

FIG. 2 is annotated with a series of numbers A-I. These numbersrepresent stages of operations. Although these stages are ordered forthis example, the stages illustrate one example to aid in understandingthis disclosure and should not be used to limit the claims. Subjectmatter falling within the scope of the claims can vary with respect tothe order and some of the operations.

FIG. 3 depicts an example storage system that generates redundant writesto reduce write tail latency using an erasure code. FIG. 3 depicts astorage system 300 that includes a storage controller 310 and storagedevices 304. The storage controller 310 includes a storage interface302, a write request manager 303 (“write manager 303”), a data encoder307, and a client interface 311.

At stage A, the client interface 311 receives the object 301 and sendsthe object 301 to the data encoder 307 to generate the encoded chunks306. Although not depicted, the client interface 311 may divide theobject 301 into a number of chunks prior to sending the chunks to thedata encoder 307. In FIG. 3, the data encoder 307 uses a non-systematicerasure code to encode the object 301 to create the encoded chunks 306.The storage system 300 may be configured to provide a specified n+k ratefor data durability purposes or to comply with service level agreements.Even though the effective rate is n+k, the data encoder 307 may generatemore than n+k chunks. The additional chunks may be referred to as k fora total of n+k+k′ chunks. The data encoder 307 may generate the k′chunks using an erasure coding algorithm with more than k parity chunksor a rateless erasure code. When a rateless erasure code is used, atheoretically unlimited number of k′ parity chunks may be generated. Thevalue of k′ may be a configured value smaller than the number of thestorage devices 304, may vary so that n+k+k′ is equal to the number ofthe storage devices 304, or may vary in response to conditions of thestorage system 300 such as a high or low volume of write requests. InFIG. 3, the storage system 300 provides a rate of 8+3 meaning that theobject 301 may be recreated using any eight of the encoded chunks 306.While only eleven chunks are needed for an 8+3 rate, the data encoder307 generates fourteen chunks for the encoded chunks 306 (i.e.,n+k+k=8+3+3) using an 8+6 encoding algorithm or a rateless erasure codealgorithm. The encoded chunks 306 include chunks EC0-EC10, EC0′, EC1′,and EC2′.

At stage B, the write manager 303 sends write requests for the encodedchunks 306 to the storage interface 302, and at stage C, the storageinterface 302 forwards the write requests for the encoded chunks 306 tothe storage devices 304. These operations are performed in a mannersimilar to that described at stages B and C of FIG. 1. However, unlikethe example storage system of FIGS. 1 and 2 which reactively generateredundant write requests, the write manager 303 of the storage system300 proactively generates redundant write requests. The write managergenerates write requests for all fourteen of the encoded chunks 306 eventhough eleven chunks is the minimum number of chunks required to bestored to provide the rate of 8+3. So, the write requests for the chunksEC0′, EC1′, and EC2′ may be considered to be redundant write requests.

At stage D, the storage interface 302 waits for a first eleven of thechunks 306 to be stored in the storage devices 304. Since the storagesystem 300 provides an effective rate 8+3, the storage interface 302does not wait for all fourteen write requests to complete. Instead, thestorage interface 302 monitors the write requests and waits for elevenof the write requests for the encoded chunks 306 to acknowledge. Thestorage interface 302 indicates to the write manager 303 locations forthe eleven chunks that were successfully stored.

At stage E, the write manager 303 writes the locations of the encodedchunks 306 to the index 305 and indicates that the object 301 was storedsuccessfully in a manner similar to that described at stage H of FIG. 1.

Although FIG. 3 depicts the data encoder 307 as using a non-systematicerasure code, other types of erasure codes may be used. In someinstances, the data encoder 307 may use a systematic rateless erasurecode and generate n+k+k′ chunks with n of the chunks being non-encodeddata from the object 301. In other instances, the data encoder 307 mayuse regular (non-rateless) erasure codes to generate n+k+k′ chunks. Forexample, if the storage system 300 is configured to provide an effectiverate of 6+2, the data encoder 307 may use an erasure code with a rate of6+4, so k′ would have a value of two. In such an implementation, thestorage interface 302 waits for eight of the ten total chunks to writesuccessfully.

In some instances, the storage system 300 may reactively generateredundant writes using similar techniques as those described in FIGS. 1and 2. For example, the storage system 300 may initially generate n+kwrite requests and not issue redundant write requests for the k′ chunks.Instead, the write manager 303 may generate redundant write requestsusing the k′ chunks in response to some of the n+k write requests notcompleting within a time threshold. So, in the illustration of FIG. 3,the storage system 300 would initially generate write requests for thechunks EC0-EC10 of the chunks 306. If, for example, the write requestsfor the chunks EC8 and EC9 failed to respond within the thresholdperiod, the storage system 300 would then generate redundant writerequests with the chunks EC0′ and EC1′.

FIG. 3 is annotated with a series of numbers A-E. These numbersrepresent stages of operations. Although these stages are ordered forthis example, the stages illustrate one example to aid in understandingthis disclosure and should not be used to limit the claims. Subjectmatter falling within the scope of the claims can vary with respect tothe order and some of the operations.

FIG. 4 depicts a flowchart illustrating example operations for storingan object in a storage system using redundant write requests. Theoperations described in FIG. 4 are described as being performed by astorage system, such as the example storage systems depicted in FIGS. 1,2, and 3.

At block 400, a storage system receives an object to be stored. Thestorage system may receive the object from a storage system client, anapplication or backup agent, etc. The object along with a request tostore the object may be received via various communication protocols,such as HTTP REST protocols, iSCSI, etc. After the storage systemreceives the object, control flows to block 402.

At block 402, the storage system processes the object to generatechunks. The storage system may generate the chunks by dividing theobject, using an erasure code, using block codes, etc. The chunks mayinclude non-encoded data of the object, encoded data blocks, parityblocks, or other data used for data durability. The size and number ofthe chunks may vary based on a size of the object, a type of an erasurecode used to generate the chunks, a type of data protection scheme, anumber of storage devices in the storage system, and/or a configurationof the storage devices in the storage system. For example, the storagesystem may divide the object into chunks that are sized to be compatiblewith a block size used by the storage devices or file system of thestorage system. When using erasure codes, the storage system maygenerate a number of chunks to satisfy an n+k rate of the erasure code,or may generate n+k+k′ chunks as described in FIG. 3. After the storagesystem generates the chunks, control flows to block 404.

At block 404, the storage system determines whether redundant writerequests should be generated proactively or reactively. The storagesystem may determine whether to generate redundant write requestsproactively or reactively based on a configuration setting or maydynamically determine whether to generate redundant write requestsproactively or reactively based on a state or condition of the storagesystem. For example, the storage system may determine to generateredundant write requests reactively if a total number of pending writerequests in the storage system or in a queue exceed a threshold.Additionally, the storage system may track a number of redundant writerequests that are reactively generated as objects are stored in thestorage system. If the average number of reactive write requests exceeda threshold, the storage system may switch to proactively generatingredundant write requests to attempt to reduce the amount of time takento store an object. If the storage system determines that redundantwrite requests should be generated reactively, control flows to block410. If the storage system determines that redundant write requestsshould be generated proactively, control flows to block 406.

At block 406, the storage system generates write requests and redundantwrite requests for the chunks. The write requests and redundant writerequests conform to the protocols supported by the storage devices, suchas SCSI, iSCSI, HTTP REST protocols, etc. The number of write requestsand redundant write requests varies based on the number of chunks andthe technique used to generate the chunks at block 404. For example, thestorage system may divide the object into eight chunks and may notgenerate any additional chunks for data durability. In such an instance,the storage system generates eight write requests and uses replicationto generate eight redundant write requests. As an additional example,the storage system may use an 8+3 erasure code to generate eleven chunksand generate eleven corresponding write requests. Furthermore, thestorage system may also use replication to generate eleven redundantwrite requests for a total of twenty-two write requests. In some otherinstances, the storage system may use an n+k+k′ erasure code or arateless erasure code to generate n+k+k′ chunks and generate n+k writerequests and k redundant write requests. After the storage systemgenerates the write requests and the redundant write requests, controlflows to block 408.

At block 408, the storage system sends the write requests and theredundant write requests to storage devices. The write requests and theredundant write requests are each sent with their corresponding chunk toa different storage device of the storage system. The storage system maylog the storage devices which received write requests or maintain writequeues for each storage device. The storage system may use the log orwrite queues to load balance or select which storage devices receivewrite requests. After the storage system sends the write requests andthe redundant write requests to the storage devices, control flows toblock 410.

At block 410, the storage system receives write acknowledgments for aminimum number of write requests. The storage system does not wait forall write requests to acknowledge that they completed successfully.Instead, the storage system waits for the minimum number of writerequests to complete, i.e. the storage system waits for the necessarynumber of chunks to be stored. For example, if the storage systemgenerated n+k+k′ write requests, the storage system waits for the n+kwrite requests that are quickest to complete. The minimum number ofwrite requests is equal to the minimum number of chunks that should bestored so that the object can be reconstructed at a later time. Ininstances where the storage system generates the redundant writerequests using replication, the minimum number of write requests isequal to the number of chunks. However, the storage system ensures thatthe minimum total includes at least one write acknowledgement for eachof the different chunks. If both a write request and a redundant writerequest for the same chunk complete simultaneously or complete beforeother chunks are stored, the storage system may ignore one of therequests or may use both storage locations to increase data durability.After the storage system receives acknowledgments for the minimum numberof write requests, control flows to block 420.

Control flowed to block 412 if the storage system determined at block404 that redundant write requests should be generated reactively. Atblock 412, the storage system generates write requests for the chunks.The storage system may generate write requests for all or a portion ofthe chunks. For example, if the storage system generated n+k+k′ chunks,the storage system may initially generate write requests for just n+kchunks. In instances where the object is divided into chunks withoutencoding, the storage system generates a write request for each of thechunks as each chunk should be stored in the storage devices for laterreconstruction of the object. After the storage system generates thewrite requests, control flows to block 414.

At block 414, the storage system sends the write requests to the storagedevices. The storage system sends the write requests in a manner similarto that described at block 408. The storage system logs the storagedevices which received write requests so that any redundant writerequests that may be generated at block 418 may be sent to storagedevices that did not initially receive a write request. After thestorage system sends the write requests to the storage devices, controlflows to block 416.

At block 416, the storage system determines whether all the writerequests completed within a threshold period. After sending the writerequests to the storage devices at block 414, the storage system beginstracking an amount of time until a write acknowledgement is receivedfrom each of the storage devices which received a write request. Oncethe amount of time is equal to or greater than the threshold period, thestorage system determines the number of write requests that have failedto acknowledge. The time threshold is a configurable amount of time andmay be adjusted according to how much delay for a write request isacceptable for the storage system or a particular use case. For example,the threshold may be 10 milliseconds. Additionally, the threshold may bedynamically adjusted for each write request based on the size of chunksbeing written, the load of the storage system, the number of writes inthe write queues, and/or the type of storage devices in the storagedevices. For example, the threshold may increase in proportion to thenumber of writes requests in the write queues. In some instances, thestorage system may identify and log which storage devices completed orfailed to complete their write request within the threshold. The storagesystem may use this information to prioritize which storage devicesreceive future write requests. For example, if a storage device failedto complete within the threshold on two consecutive write requests, thestorage system may redirect write requests for that storage device to adifferent storage device. If the storage system determines that all thewrite requests completed within the time threshold, control flows toblock 420. If the storage system determines that some of the writerequests did not complete within the time threshold, control flows toblock 418.

At block 418, the storage system generates redundant write requests andsends the redundant write requests to the storage devices. The number ofredundant write requests is equal to the number of the write requeststhat failed to complete within the threshold. In some instances, thestorage system may generate redundant write requests for the same chunkswhose write requests failed to complete within the threshold.Alternatively, in other instances, the technique for generating theredundant write requests may vary based on how the chunks were generatedat block 404. If, for example, the storage system used an n+k+k′ erasurecode, the storage system may use the k chunks to generate redundantwrite requests. If, for example, storage system used a rateless erasurecode, the storage system may generate additional chunks and issue writerequests for the additional chunks. After generating and sending theredundant write requests to the storage devices, control returns toblock 416, and the storage system begins tracking an amount of time forthe redundant write requests to acknowledge. Additional redundant writerequests may be generated if the redundant write requests also fail tocomplete within the time threshold.

Control flowed to block 420 if the storage device received writeacknowledgments for the minimum number of write requests at block 410 orif the storage device determined that the write requests completedwithin the time threshold at block 416. At block 420, the storage systemrecords locations of stored chunks along with an identifier for theobject. The stored chunks are those chunks whose write requestsacknowledged the quickest at block 410 or whose write requests completedwithin the time threshold at block 416. The location for each of thestored chunks includes an identifier for the storage device on which thechunk is stored and an address for the chunk. The storage system mayrecord the locations in a database, a log or table maintained in memory,persistent storage, etc. The storage system associates the locationswith the identifier for the object. The identifier may be a uniqueidentifier assigned by the storage system or an identifier received withthe request to store the object in the storage system. After the storagesystem records the locations of the stored chunks, control flows toblock 422.

At block 422, the storage system cancels remaining write requests. Theremaining write requests are those that failed to complete before theminimum number of write requests at block 410 or failed to completewithin the time threshold at block 416. Since the remaining writerequests are no longer needed, the storage may cancel the remainingwrite requests by issuing a cancellation request to the associatedstorage devices, removing them from the write queues, etc. Additionally,if the remaining write requests complete, the storage system may markstorage locations for the remaining write requests as deleted or markthe storage locations for garbage collection. After the storage systemcancels remaining write requests, the process ends.

FIG. 5 depicts a flowchart that illustrates example operations forstoring an object in a storage system using a dynamic number ofredundant write requests. The operations described in FIG. 5 aredescribed as being performed by a storage system, such as the examplestorage systems depicted in FIGS. 1, 2, and 3.

At block 500, the storage system receives a plurality of objects tostore. The plurality of objects may be received concurrently or over aperiod of time. The storage system may perform the operations describedat blocks 502-518 as objects are received over the period of time. Afterthe storage system receives the plurality of objects, control flows toblock 502.

At block 502, the storage system begins storage operations for theplurality of objects. An object of the plurality of objects currentlybeing stored is hereinafter referred to as the selected object. Afterthe storage system selects the selected object, control flows to block504.

At block 504, the storage system processes the selected object togenerate x chunks. The storage system may process the selected object togenerate the x chunks in a manner similar to that described at block 402of FIG. 4. For example, the storage system may generate the x chunksusing an erasure code. The value of x is configurable and may beadjusted based on performance characteristics as described at block 514.When using an erasure code, the storage system may adjust the rate ofthe erasure code so that n+k or n+k+k′ is equal to the value of x or usea rateless erasure code to generate x chunks. After the storage systemgenerates x chunks, control flows to block 506.

At block 506, the storage system generates write requests for thechunks. The storage system issues write requests for a minimum number ofthe chunks that should be stored in order to allow the selected objectto be reconstructed at a later time. For example, if the storage systemis configured to provide a rate of n+k for data durability, the numberof chunks that should be stored, and therefore the minimum number ofwrite requests, is equal to n+k. The number of write requests may not beequal to x in instances where additional chunks are generated andreserved for redundant write requests. For example, when x is equal togenerate n+k+k′, the storage system generates n+k write requests and k′chunks are reserved for potential redundant write requests. After thestorage system generates the write requests, control flows to block 508.

At block 508, the storage system generates y redundant write requestsfor the chunks. As described in more detail at block 514, the storagesystem may adjust the value of y in response to performancecharacteristics of the write requests. In some instances, the value of ymay be initially set to zero and may increase in response to slow writerequests or degraded performance of the storage system. In instanceswhere n+k+k′ chunks are generated, the value of y may initially be equalto k and may decrease in response to write requests consistentlycompleting successfully. Additionally, the value of y may vary based ona number of storage devices in the storage system. For example, thevalue of y may increase as more storage devices are added to the storagesystem and decrease as the number of storage devices decreases. Ininstances where redundant write requests are reactively generated, thevalue of y varies based on the number of write requests that fail toacknowledge with a time threshold. As described in more detail at block514, the storage system may also adjust the threshold in response toperformance characteristics which may affect the value of y. After thestorage system generates the y redundant write requests, control flowsto block 510.

At block 510, the storage system sends write requests to the storagedevices. The storage system sends the write requests and any redundantwrite requests to different storage devices. The write requests are sentin a manner similar to that described at block 408 of FIG. 4. After thestorage system sends the write requests to the storage devices, controlflows to block 512.

At block 512, the storage system records and analyzes performancecharacteristics related to storing the selected object. The performancecharacteristics can include information such as the time taken tocomplete each write request or a minimum number of write requests, whichstorage devices acknowledged the quickest, an amount of storage space inthe storage devices, etc. In instances where redundant write requestsare reactively generated, the performance characteristics can include anumber of redundant write requests that were generated. The storagesystem records the performance characteristics so that performancecharacteristics related to storing the plurality of objects orperformance characteristics related to each of the storage devices maybe compared. The storage system may indicate in the performancecharacteristics the values of the threshold, x, and y so thatperformance patterns in relation to the values of the threshold, x and ycan be determined. For example, the storage system may determine that acertain combination of x and y values resulted in faster object storagetimes. The storage system may analyze the performance characteristics todetermine if performance thresholds were met. For example, the storagesystem may ensure that the selected object was stored within an amountof time. After the storage system records and analyzes the performancecharacteristics, control flows to block 514.

At block 514, the storage system adjusts the values of x, y, and thethreshold based on the performance characteristics. The storage systemmay use machine learning algorithms, such as classification algorithmsor anomaly detection algorithms, to dynamically adjust the values of x,y, and the threshold. The storage system may also use machine learningalgorithms to detect slow storage devices connected to the storagesystem and then prioritize those storage devices that provideconsistently better performance for write requests at block 510. Thevalues of x, y, and the threshold may also be adjusted based on specificperformance characteristics. For example, the storage system mayincrease the number of redundant write requests y or decrease the timethreshold in response to the selected object not storing within adesired period of time. Since the storage system uses the fastest writesto acknowledge, increasing the redundant write requests or decreasingthe time threshold decreases the chances of delay in the storageprocess. The storage system may decrease the values of x and y andincrease the value of the threshold in response to determining thatadditional overhead incurred when generating a high number of chunks andredundant write requests is degrading system performance. The storagesystem may also decrease y if there is a high number of pending writerequests for the storage devices. The storage system may adjust thevalue of x in response to increasing or decreasing the value of y asmore or less chunks may be needed for the redundant write requests.Conversely, the storage system may adjust the value of y in response toincreasing or decreasing the value of x as more or less redundant writerequests may be needed for the x chunks. To accommodate the adjusted xand y values, the storage system may also adjust the technique forgenerating chunks. For example, the storage system may adjust a rate foran erasure code to generate additional chunks or may adjust the size ofthe chunks. After the storage system adjusts the x and y values, controlflows to block 516.

At block 516, the storage system determines if there is an additionalobject in the plurality of objects. If there is an additional object,the storage system selects the additional object at block 502. If thereis not an additional object, the process ends.

Variations

The flowcharts are provided to aid in understanding the illustrationsand are not to be used to limit scope of the claims. The flowchartsdepict example operations that can vary within the scope of the claims.Additional operations may be performed; fewer operations may beperformed; the operations may be performed in parallel; and theoperations may be performed in a different order. For example, theoperations depicted in blocks 420 and 422 of FIG. 4 can be performed inparallel, concurrently, in reverse order, etc. Additionally, theoperations depicted in block 514 of FIG. 5 may not be performed in someiterations. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented byprogram code. The program code may be provided to a processor of ageneral purpose computer, special purpose computer, or otherprogrammable machine or apparatus.

Some operations above iterate through sets of items, such as objects. Insome instances, objects may be iterated over according to a time theywere received, a priority indicator, size, etc. Also, the number ofiterations for loop operations may vary. Different techniques forprocessing and storing objects may require fewer iterations or moreiterations. For example, objects may be stored in parallel, reducing thenumber of iterations. Additionally, a loop may not iterate for eachobject. For example, objects may be deduplicated.

In some instances in the description above, a redundant write request isgenerated in response to an initial write request not completing withina threshold. Instead of waiting for a threshold to expire, redundantwrite requests may also be generated in response to determining that astorage device is unable to complete a write request. A storage devicemay be unable to complete a write request if the storage device is full,malfunctioning, corrupted, non-operational, etc.

In the description above, redundant write requests and storage ofobjects are isolated to a single storage system however this may notalways be the case. In some instances, objects may be stored acrossmultiple storage systems, clusters, or nodes. Similarly, redundant writerequests may span multiple storage systems, clusters, or nodes. Thewrite manager or some bookkeeping component external to the storagesystem may track the various storage locations and update the index withinformation for locating the chunks.

The variations described above do not encompass all possible variations,aspects, or features. Other variations, modifications, additions, andimprovements are possible.

The examples often refer to a “write manager.” The write manager is aconstruct used to refer to implementation of functionality forgenerating and monitoring writes of a storage system. This construct isutilized since numerous aspects or features are possible. A writemanager may be a particular component or components of a machine (e.g.,a particular circuit card enclosed in a housing with other circuitcards/boards), machine-executable program or programs, firmware, acircuit card with circuitry configured and programmed with firmware forgenerating write requests, etc. The term is used to efficiently explaincontent of the disclosure. The write manager can also be referred to aswrite controller, write component, write generator, etc. Although theexamples refer to operations being performed by a write manager,different entities can perform different operations. For instance, adedicated co-processor or application specific integrated circuit cangenerate and monitor writes of a storage system.

As will be appreciated, aspects of the disclosure may be embodied as asystem, method or program code/instructions stored in machine-readablemedia. Accordingly, aspects may take the form of hardware, software(including firmware, resident software, micro-code, etc.), or acombination of software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Thefunctionality presented as individual modules/units in the exampleillustrations can be organized differently in accordance with any one ofplatform (operating system and/or hardware), application ecosystem,interfaces, programmer preferences, programming language, administratorpreferences, etc.

Any combination of machine readable medium(s) may be utilized. Themachine readable medium may be a machine readable signal medium or amachine readable storage medium. A machine readable storage medium maybe, for example, but not limited to, a system, apparatus, or device,that employs any one of or combination of electronic, magnetic, optical,electromagnetic, infrared, or semiconductor technology to store programcode. More specific examples (a non-exhaustive list) of the machinereadable storage medium would include the following: a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), a portable compact disc read-only memory (CD-ROM), an opticalstorage device, a magnetic storage device, or any suitable combinationof the foregoing. In the context of this document, a machine readablestorage medium may be any tangible medium that can contain, or store aprogram for use by or in connection with an instruction executionsystem, apparatus, or device. A machine readable storage medium is not amachine readable signal medium.

A machine readable signal medium may include a propagated data signalwith machine readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Amachine readable signal medium may be any machine readable medium thatis not a machine readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a machine readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thedisclosure may be written in any combination of programming languages,including an object oriented programming language such as the Java®programming language, C++ or the like; a dynamic programming languagesuch as Python; a scripting language such as Perl programming languageor PowerShell script language; and conventional procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The program code may execute entirely on a stand-alonemachine, may execute in a distributed manner across multiple machines,and may execute on one machine while providing results and or acceptinginput on another machine.

The program code/instructions may also be stored in a machine readablemedium that can direct a machine to function in a particular manner,such that the instructions stored in the machine readable medium producean article of manufacture including instructions which implement thefunction/act specified in the flowchart and/or block diagram block orblocks.

FIG. 6 depicts an example computer system with a write request manager.The computer system includes a processor 601 (possibly includingmultiple processors, multiple cores, multiple nodes, and/or implementingmulti-threading, etc.). The computer system includes memory 607. Thememory 607 may be system memory (e.g., cache, SRAM, DRAM, zero capacitorRAM, Twin Transistor RAM, eDRAM, EDO RAM, DDR RAM, EEPROM, NRAM, RRAM,SONOS, PRAM, etc.) or any of the above already described possiblerealizations of machine-readable media. The computer system alsoincludes storage devices 609. The storage devices 609 may be local orremote storage (e.g., a hard disk or hard disk array, a diskette, anoptical storage device, a magnetic storage device, Network AttachedStorage (NAS), Storage Area Network (SAN), all flash array, RAIDconfigured storage) or any of the above already described possiblerealizations of machine-readable media. The computer system alsoincludes a bus 603 (e.g., PCI, ISA, PCI-Express, HyperTransport® bus,InfiniBand® bus, NuBus, etc.) and a network interface 605 (e.g., a FiberChannel interface, an Ethernet interface, an internet small computersystem interface, SONET interface, wireless interface, etc.). The systemalso includes a write request manager 611. The write request manager 611generates redundant write requests and records locations for chunks ofan object stored in a storage system. Any one of the previouslydescribed functionalities may be partially (or entirely) implemented inhardware and/or on the processor 601. For example, the functionality maybe implemented with an application specific integrated circuit, in logicimplemented in the processor 601, in a co-processor on a peripheraldevice or card, etc. Further, realizations may include fewer oradditional components not illustrated in FIG. 6 (e.g., video cards,audio cards, additional network interfaces, peripheral devices, etc.).The processor 601 and the network interface 605 are coupled to the bus603. Although illustrated as being coupled to the bus 603, the memory607 may be coupled to the processor 601.

FIG. 7 depicts an example distributed storage system with a storagecontroller that includes a write request manager. The distributedstorage system includes a storage controller 701, a metadata server 702,and storage devices 703 which are connected through a network 704. Thenetwork 704 may be local area network or wide area network and mayinclude network devices such as routers, gateways, firewalls, switches,etc. The storage controller 701, the metadata server 702, and thestorage devices 703 may communicate through the network 704 usingvarious network protocols such as HTTP, iSCSI, and any of the abovealready described possible protocols. As indicated by the ellipses, thestorage devices 703 may include number of network connected storagedevices in addition to the storage device A 703A and the storage deviceB 703B. The storage controller 701 includes a write request manager 705.The write request manager 705 generates redundant write requests thatare sent through the network 704 to the storage devices 703. The writerequest manager 705 determines locations for chunks of an object storedthroughout the distributed storage system and records the locations inthe metadata server 702. The metadata server 702 may be a database orother storage device that maintains locations of objects stored in thedistributed storage system.

While the aspects of the disclosure are described with reference tovarious features and exploitations, it will be understood that theseaspects are illustrative and that the scope of the claims is not limitedto them. In general, techniques for reducing write tail latency usingredundant write requests as described herein may be implemented withfacilities consistent with any hardware system or hardware systems. Manyvariations, modifications, additions, and improvements are possible.

Plural instances may be provided for components, operations orstructures described herein as a single instance. Finally, boundariesbetween various components, operations and data stores are somewhatarbitrary, and particular operations are illustrated in the context ofspecific illustrative configurations. Other allocations of functionalityare envisioned and may fall within the scope of the disclosure. Ingeneral, structures and functionality presented as separate componentsin the example configurations may be implemented as a combined structureor component. Similarly, structures and functionality presented as asingle component may be implemented as separate components. These andother variations, modifications, additions, and improvements may fallwithin the scope of the disclosure.

What is claimed is:
 1. A method comprising: in response to receiving afirst object to be stored, generating a first set of data units based,at least in part, on the first object; sending each of the first set ofdata units with a write request to a different one of a plurality ofstorage devices; and in response to determining that a minimum number ofwrite requests have completed, identifying locations for a subset of thefirst set of data units which correspond to the completed writerequests; and updating an index with an identifier for the first objectand the locations of the subset of data units.
 2. The method of claim 1,further comprising: in response to determining that a completion periodfor the completed write requests exceeds a threshold, increasing aspecified number of data units to be generated in response to receivingan object to be stored; and in response to receiving a second object tobe stored, generating a second set of data units based, at least inpart, on the second object, wherein a number of data units in the secondset of data units is equal to the increased specified number of dataunits to be generated.
 3. The method of claim 2, wherein said generatingthe second set of data units based, at least in part, on the secondobject comprises: determining an erasure code used to generate the firstset of data units; and modifying the erasure code to generate a numberof data units equal to the increased specified number of data units tobe generated.
 4. The method of claim 1, further comprising: generating asecond set of data units based, at least in part, on the first object;and sending each of the second set of data units with a write request toa different one of the plurality of storage devices; wherein saiddetermining that the minimum number of write requests have completedcomprises determining that a number of the completed write requests forthe first set of data units added to a number of completed writerequests for the second set of data units is equal to the minimum numberof the write requests.
 5. The method of claim 4, wherein sending each ofthe second set of data units with a write request to a different one ofthe plurality of storage devices comprises: for each of the second setof data units, determining that a write request corresponding to one ofthe first set of data units failed to complete within a threshold;wherein the data unit is sent in response to determining that the writerequest corresponding to the one of the first set of data units failedto complete within the threshold.
 6. The method of claim 1, wherein theminimum number of the write requests is equal to a minimum number ofdata units required to reconstruct the first object, wherein a number ofdata units in the first set of data units is equal or greater than aminimum number of data units required to reconstruct the first object.7. The method of claim 1, further comprising cancelling write requestscorresponding to data units not in the subset of data units.
 8. Anon-transitory machine readable medium having stored thereoninstructions for storing an object comprising machine executable codewhich when executed by at least one machine, causes the machine to: inresponse to receipt of a first object to be stored, generate a first setof data units based, at least in part, on the first object; send each ofthe first set of data units with a write request to a different one of aplurality of storage devices; and in response to a determination that aminimum number of write requests have completed, identify locations fora subset of the first set of data units which correspond to thecompleted write requests; and update an index with an identifier for thefirst object and the locations of the subset of data units.
 9. Themachine readable medium of claim 8, further comprising machineexecutable code which when executed by the machine, causes the machineto: in response to a determination that a completion period for thecompleted write requests exceeds a threshold, increase a specifiednumber of data units to be generated in response to receiving an objectto be stored; and in response to receipt of a second object to bestored, generate a second set of data units based, at least in part, onthe second object, wherein a number of data units in the second set ofdata units is equal to the increased specified number of data units tobe generated.
 10. The machine readable medium of claim 9, wherein themachine executable code which when executed by the machine, causes themachine to generate the second set of data units based, at least inpart, on the second object comprise instructions to: determine anerasure code used to generate the first set of data units; and modifythe erasure code to generate a number of data units equal to theincreased specified number of data units to be generated.
 11. Themachine readable medium of claim 8, further comprising machineexecutable code which when executed by the machine, causes the machineto: generate a second set of data units based, at least in part, on thefirst object; and send each of the second set of data units with a writerequest to a different one of the plurality of storage devices; whereinthe machine executable code which when executed by the machine, causesthe machine to determine that the minimum number of write requests havecompleted comprise machine executable code which when executed by themachine, causes the machine to determine whether a number of thecompleted write requests for the first set of data units added to anumber of completed write requests for the second set of data units isequal to the minimum number of the write requests.
 12. The machinereadable medium of claim 11, wherein the machine executable code whichwhen executed by the machine, causes the machine to send each of thesecond set of data units with a write request to a different one of theplurality of storage devices comprise machine executable code which whenexecuted by the machine, causes the machine to: for each of the secondset of data units, determine whether a write request corresponding toone of the first set of data units failed to complete within athreshold; wherein the data unit is sent in response to a determinationthat the write request corresponding to the one of the first set of dataunits failed to complete within the threshold.
 13. The machine readablemedium of claim 8, wherein the minimum number of the write requests isequal to a minimum number of data units required to reconstruct thefirst object, wherein a number of data units in the first set of dataunits is equal to or greater than a minimum number of data unitsrequired to reconstruct the first object.
 14. A computing devicecomprising: a processor; and a machine readable medium comprisingmachine executable code having stored thereon instructions executable bythe processor to cause the computing device to: in response to receiptof a first object to be stored, generate a first set of data unitsbased, at least in part, on the first object; send each of the first setof data units with a write request to a different one of a plurality ofstorage devices; and in response to a determination that a minimumnumber of write requests have completed, identify locations for a subsetof the first set of data units which correspond to the completed writerequests; and update an index with an identifier for the first objectand the locations of the subset of data units.
 15. The computing deviceof claim 14, further comprising machine executable code executable bythe processor to cause the computing device to: in response to adetermination that a completion period for the completed write requestsexceeds a threshold, increase a specified number of data units to begenerated in response to receiving an object to be stored; and inresponse to receipt of a second object to be stored, generate a secondset of data units based, at least in part, on the second object, whereina number of data units in the second set of data units is equal to theincreased specified number of data units to be generated.
 16. Thecomputing device of claim 15, wherein the machine executable codeexecutable by the processor to cause the computing device to generatethe second set of data units based, at least in part, on the secondobject comprises program code executable by the processor to cause theapparatus to: determine an erasure code used to generate the first setof data units; and modify the erasure code to generate a number of dataunits equal to the increased specified number of data units to begenerated.
 17. The computing device of claim 14, further comprisingmachine executable code executable by the processor to cause thecomputing device to: generate a second set of data units based, at leastin part, on the first object; and send each of the second set of dataunits with a write request to a different one of the plurality ofstorage devices; wherein the machine executable code executable by theprocessor to cause the computing device to determine that the minimumnumber of write requests have completed comprises machine executablecode executable by the processor to cause the computing device todetermine whether a number of the completed write requests for the firstset of data units added to a number of completed write requests for thesecond set of data units is equal to the minimum number of the writerequests.
 18. The computing device of claim 17, wherein the machineexecutable code executable by the processor to cause the computingdevice to send each of the second set of data units with a write requestto a different one of the plurality of storage devices comprises machineexecutable code executable by the processor to cause the computingdevice to: for each of the second set of data units, determine whether awrite request corresponding to one of the first set of data units failedto complete within a threshold; wherein the data unit is sent inresponse to a determination that the write request corresponding to theone of the first set of data units failed to complete within thethreshold.
 19. The computing device of claim 14, wherein the minimumnumber of the write requests is equal to a minimum number of data unitsrequired to reconstruct the first object, wherein a number of data unitsin the first set of data units is equal to or greater than a minimumnumber of data units required to reconstruct the first object.
 20. Thecomputing device of claim 14, further comprising machine executable codeexecutable by the processor to cause the computing device to cancelwrite requests corresponding to data units not in the subset of dataunits.